maximum entropy discrimination
Maximum Entropy Discrimination
We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calcula(cid:173) tions involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed un(cid:173) der this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class-conditional models within this framework.
Mistake Bounds for Maximum Entropy Discrimination
We establish a mistake bound for an ensemble method for classification based on maximizing the entropy of voting weights subject to margin constraints. The bound is the same as a general bound proved for the Weighted Majority Algorithm, and similar to bounds for other variants of Winnow. We prove a more refined bound that leads to a nearly opti- mal algorithm for learning disjunctions, again, based on the maximum entropy principle. We describe a simplification of the on-line maximum entropy method in which, after each iteration, the margin constraints are replaced with a single linear inequality. The simplified algorithm, which takes a similar form to Winnow, achieves the same mistake bounds.
Latent Laplacian Maximum Entropy Discrimination for Detection of High-Utility Anomalies
Hou, Elizabeth, Sricharan, Kumar, Hero, Alfred O.
Anomaly detection is a very pervasive problem applicable to a variety of domains including network intrusion, fraud detection, and system failures. It is a crucial task in many applications because failure to detect anomalous activity could result in highly undesirable outcomes. For example, (i) detection of anomalous medical claims is important to identify fraud; (ii) detection of fraudulent credit card transactions is necessary to help prevent identity theft; and (iii) detection of abnormal network traffic is necessary to identify hacking. Many techniques have been developed for anomaly detection. These methods can be broadly classified into two categories: (i) rule-based systems, and (ii) statistical datadriven approaches. The rule-based systems are based on domain expertise and look for specific types of anomalies while the data-driven approaches look to identify anomalies by identifying statistically rare patterns. Examples of datadriven methods include parametric methods that assume a known family for the nominal (non-anomalous) distribution and nonparametric methods such as those using unsupervised or semi-supervised support vector machines (SVMs) [1], [2] or based on minimum volume set estimation [3], [4], [5]. The advantage of data-driven approaches over rule-based methods is that they can identify novel types of anomalies that are unknown to the domain expert.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Saskatchewan (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Law Enforcement & Public Safety > Fraud (0.88)
- Banking & Finance (0.88)
- Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Mistake Bounds for Maximum Entropy Discrimination
We establish a mistake bound for an ensemble method for classification based on maximizing the entropy of voting weights subject to margin constraints. The bound is the same as a general bound proved for the Weighted Majority Algorithm, and similar to bounds for other variants of Winnow. We prove a more refined bound that leads to a nearly optimal algorithm for learning disjunctions, again, based on the maximum entropy principle. We describe a simplification of the online maximum entropy method in which, after each iteration, the margin constraints are replaced with a single linear inequality. The simplified algorithm, which takes a similar form to Winnow, achieves the same mistake bounds.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore (0.04)
Mistake Bounds for Maximum Entropy Discrimination
We establish a mistake bound for an ensemble method for classification based on maximizing the entropy of voting weights subject to margin constraints. The bound is the same as a general bound proved for the Weighted Majority Algorithm, and similar to bounds for other variants of Winnow. We prove a more refined bound that leads to a nearly optimal algorithm for learning disjunctions, again, based on the maximum entropy principle. We describe a simplification of the online maximum entropy method in which, after each iteration, the margin constraints are replaced with a single linear inequality. The simplified algorithm, which takes a similar form to Winnow, achieves the same mistake bounds.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore (0.04)
Mistake Bounds for Maximum Entropy Discrimination
We establish a mistake bound for an ensemble method for classification based on maximizing the entropy of voting weights subject to margin constraints. The bound is the same as a general bound proved for the Weighted Majority Algorithm, and similar to bounds for other variants of Winnow. We prove a more refined bound that leads to a nearly optimal algorithmfor learning disjunctions, again, based on the maximum entropy principle. We describe a simplification of the online maximum entropy method in which, after each iteration, the margin constraints are replaced with a single linear inequality. The simplified algorithm, which takes a similar form to Winnow, achieves the same mistake bounds.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Singapore (0.04)
Maximum Entropy Discrimination
Jaakkola, Tommi, Meila, Marina, Jebara, Tony
We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class-conditional models within this framework.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.16)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.59)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Maximum Entropy Discrimination
Jaakkola, Tommi, Meila, Marina, Jebara, Tony
We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involve distributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under this class and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class-conditional models within this framework.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.16)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.59)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Maximum Entropy Discrimination
Jaakkola, Tommi, Meila, Marina, Jebara, Tony
We present a general framework for discriminative estimation based on the maximum entropy principle and its extensions. All calculations involvedistributions over structures and/or parameters rather than specific settings and reduce to relative entropy projections. This holds even when the data is not separable within the chosen parametric class, in the context of anomaly detection rather than classification, or when the labels in the training set are uncertain or incomplete. Support vector machines are naturally subsumed under thisclass and we provide several extensions. We are also able to estimate exactly and efficiently discriminative distributions over tree structures of class-conditional models within this framework.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.16)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.64)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.59)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)